Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification
نویسندگان
چکیده
We consider the problem of classification using high dimensional features’ space. In a paper by Bickel and Levina (2004), it is recommended to use naive-Bayes classifiers, that is, to treat the features as if they are statistically independent. Consider now a sparse setup, where only a few of the features are informative for classification. Fan and Fan (2008), suggested a variable selection and classification method, called FAIR. The FAIR method improves the design of naive-Bayes classifiers in sparse setups. The improvement is due to reducing the noise in estimating the features’ means. This reduction is since that only the means of a few selected variables should be estimated. We also consider the design of naive Bayes classifiers. We show that a good alternative to variable selection is estimation of the means through a certain non parametric empirical Bayes procedure. In sparse setups the empirical Bayes implicitly performs an efficient variable selection. It also adapts very well to non sparse setups, and has the advantage of making use of the information from many “weakly informative” variables, which variable selection type of classification procedures give up on using. We compare our method with FAIR and other classification methods in simulation for sparse and non sparse setups, and in real data examples involving classification of normal versus malignant tissues based on microarray data.
منابع مشابه
Parametric Empirical Bayes Test and Its Application to Selection of Wavelet Threshold
In this article, we propose a new method for selecting level dependent threshold in wavelet shrinkage using the empirical Bayes framework. We employ both Bayesian and frequentist testing hypothesis instead of point estimation method. The best test yields the best prior and hence the more appropriate wavelet thresholds. The standard model functions are used to illustrate the performance of the p...
متن کاملEmpirical Bayes Estimation in Nonstationary Markov chains
Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical Bayes estimators for the transition probability matrix of a finite nonstationary Markov chain. The data are assumed to be of a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...
متن کاملA novel means of using gene clusters in a two-step empirical Bayes method for predicting classes of samples
MOTIVATION The classification of samples using gene expression profiles is an important application in areas such as cancer research and environmental health studies. However, the classification is usually based on a small number of samples, and each sample is a long vector of thousands of gene expression levels. An important issue in parametric modeling for so many gene expression levels is th...
متن کاملBayesian Classification for Inspection of Industrial Products
In this paper, a real time application for visual inspection and classification of cork stoppers is presented. Each cork stopper is represented by a high dimensional set of characteristics corresponding to relevant visual features. We have applied a set of non-parametric and parametric methods in order to compare and evaluate their performance for this real problem. The best results have been a...
متن کاملتشخیص سرطان پستان با استفاده از برآورد ناپارمتری چگالی احتمال مبتنی بر روشهای هستهای
Introduction: Breast cancer is the most common cancer in women. An accurate and reliable system for early diagnosis of benign or malignant tumors seems necessary. We can design new methods using the results of FNA and data mining and machine learning techniques for early diagnosis of breast cancer which able to detection of breast cancer with high accuracy. Materials and Methods: In this study,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 10 شماره
صفحات -
تاریخ انتشار 2009